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ABSTRACT 

^  We  develop  computationally  efficient  iterative  algorithms  for  finding  the  Maximum  Likelihood 
estimates  of  the  delay  and  spectral  parameters  of  a  noise-like  Gaussian  signal  radiated  from  a 
common  point  source  and  observed  by  two  or  more  spatially  separated  receivers.  We  first  consider 
the  stationary  case  in  which  the  source  is  stationary  (not  moving)  and  the  observed  signals  are  mod¬ 
eled  as  wide  sense  stationary  processes.  We  then  extend  the  scope  by  considering  a  non-stationary 
(moving)  source  radiating  a  possible  non-stationary  stochastic  signal.  In  that  context,  we  address  the 
practical  problem  of  estimation  given  discrete-time  observations.  We  also  present  efficient  methods 
for  calculating  the  log-likelihood  gradient  (score),  the  Hessian,  and  the  Fisher's  information  matrix 
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I.  INTRODUCTION 


Estimation  of  the  time  delay  between  signals  radiated  from  a  common  point  source  and 
observed  at  two  or  more  spatially  separated  receivers  is  a  problem  of  considerable  practical  interest 
(e.g.  see  [1]).  Assuming  that  the  source  signal  and  the  additive  receiver  noises  are  mutually  inde¬ 
pendent  wide  sense  stationary  Gaussian  processes  with  known  spectra,  and  that  the  observation  in¬ 
terval  is  long  compared  with  the  correlation  time  (inverse  bandwidth)  of  the  signal  and  the  noises, 
the  Maximum  Likelihood  (ML)  estimate  of  the  pairwise  differential  delay  is  obtained  by  pre-filter¬ 
ing  and  cross-correlating  the  received  signals,  and  searching  for  the  peak  of  the  cross-correlator 
output  [2]  [3],  Under  the  stated  assumptions,  the  ML  delay  estimate  is  optimal  in  the  sense  that  it  is 
asymptotically  unbiased  and  its  error  variance  approaches  the  Cramer-Rao  lower  bound. 

In  practice,  the  signal  and  noise  spectra  are  not  precisely  known.  One  is  unlikely  to  have  accu¬ 
rate  prior  information  about  signal  bandwidth,  center  frequency,  or  power  level,  and  the  statistical 
description  of  the  noise  field  is  similarly  incomplete.  In  [4]  it  has  been  shown  that  lack  of  knowl¬ 
edge  of  the  spectral  parameters  does  not  affect  the  quality  (mean  square  error)  of  the  delay  estimate. 
However,  this  is  true  only  if  the  joint  ML  estimation  of  the  delay  and  the  unknown  spectral  param¬ 
eters  is  carried  out,  and  the  estimation  errors  are  made  sufficiently  small. 

Unfortunately,  for  most  cases,  the  joint  ML  estimation  of  the  delay  and  spectral  parameters  in¬ 
volves  a  complicated  multi-dimensional  optimization,  and  therefore  it  has  not  been  attempted  in 
practice.  A  common  ad-hoc  approach  [3]  consists  of  estimating  the  signal  and  noise  spectra  (or, 
alternatively,  the  coherence  function)  prior  to  the  cross-correlation  operation.  However,  this  proce¬ 
dure  is  only  sub-optimal,  and  its  inherent  accuracy  critically  depends  on  the  method  employed  for 
spectral  estimation. 

A  further  complication  in  the  delay  estimation  problem  arises  if  the  signal  source  is  moving  rel¬ 
ative  to  the  receivers,  causing  to  a  time-varying  delay.  Measurement  of  the  delay  derivative,  that  is 
the  Doppler  time-compression,  provides  important  additional  information  concerning  source  location 
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and  track.  However,  the  time- varying  delay  causes  the  signals  observed  at  different  receivers  to  be 
jointly  non-stationary,  even  if  the  signal  at  each  receiver  output  is  stationary.  An  approximate  ML 
scheme,  under  the  assumption  of  linearly  time-varying  delay  (constant  Doppler)  was  developed  by 
Knapp  and  Carter  [5].  Basically,  it  forms  the  cross-correlation  of  one  receiver  output  with  respect 
to  a  time-delayed  and  time-scaled  version  of  the  other  receiver  output,  and  obtain  the  joint  ML  est¬ 
imate  of  the  delay  and  Doppler  parameters  by  maximizing  the  cross-correlation  response.  An  alter¬ 
native  scheme  that  does  not  depend  on  the  constant  Doppler  assumption,  and  by-pass  the  need  for 
the  complicated  time-scaling  operation  was  developed  in  [6].  It  basically  consists  of  partitioning  the 
observation  interval  into  time  segments,  obtain  the  estimate  of  the  average  delay  at  each  time  seg¬ 
ment,  and  then  apply  a  least-squares  procedure  to  convert  these  estimates  to  the  estimate  of  the 
delay,  Doppler  and  higher  order  derivatives.  If  the  duration  of  each  time  segment  is  long  compared 
with  the  correlation  time  of  the  signal  and  the  noises,  this  estimation  procedure  is  nearly  optimal  in 
the  sense  that  it  yields  an  asymptotically  unbiased  minimum  variance  estimate  of  the  delay  parame¬ 
ters.  As  pointed  out  before,  if  there  are  unknown  spectral  parameters,  they  must  be  included  in  the 
estimation  process  in  order  to  maintain  the  asymptotic  efficiency  of  the  delay  estimates,  and  this 
may  drastically  complicate  the  procedures  in  [5]  and  in  [6]. 

If  the  signal  and  noise  processes  are  not  stationary  over  the  observation  interval,  the  delay  esti¬ 
mation  problem  is  significantly  more  complicated,  and  most  of  the  analyses  (e.g.,  [2]-[6])  cannot  be 
applied.  In  a  recent  paper  [7],  Stuller  derives  the  log-likelihood  function  under  the  conditions  of 
time-varying  delay  and  possibly  non-stationary  source  signal  contaminated  by  additive  white  Gaus¬ 
sian  noises.  Basically,  it  involves  the  solution  to  an  integral  equation  for  each  value  of  the  unknown 
parameters  (delay,  Doppler,  amplitude  attenuations,  spectral  parameters)  in  the  a-priori  domain. 
Therefore,  the  full  multi-dimensional  search  for  the  ML  parameter  estimates  tends  to  be  computa¬ 
tionally  complex  and  time  consuming.  In  a  subsequent  paper  [8],  Namazi  and  Stuller  take  a  differ¬ 
ent  approach;  they  view  the  two-channel  time  delay  estimation  as  a  demodulation  problem,  where 
the  time  varying  delay  is  modeled  as  a  realization  from  a  stochastic  process.  They  propose  iterative 


-  4  - 


algorithms,  based  on  the  ML,  Maximum-A-Posteriori  (MAP),  and  Minimum-Mean-Square-Error 
(MMSE)  criteria,  for  simultaneous  estimation  of  the  source  signal  and  the  delay  process.  Each  itera¬ 
tion  cycle  requires  the  solution  of  an  integral  equation  to  obtain  the  up-dated  estimate  of  the  signal, 
and  an  integral  equation  to  obtain  the  up-dated  estimate  of  the  time-varying  delay.  In  addition  to 
being  numerically  tedius,  there  is  no  proof  of  convergence  of  these  algorithms,  and  if  they  converge 
it  is  unclear  where  they  converge  to.  Another  limitation  of  these  algorithms  is  that  they  assume 
prior  knowledge  of  the  covariance  functions  of  the  signal  and  the  delay  processes. 

A  practical  problem  that  has  been  overlooked  in  most  of  the  analyses,  is  the  use  of  sampled 
data.  If  the  received  signals  are  first  sampled  and  then  processed,  the  cross-correlation  function  can 
be  calculated  only  at  a  discrete  set  of  time-lags,  and  the  delay  estimate  is  subject  to  the  sampling 
period.  A  common  numerical  solution  is  to  apply  a  polynomial  fitting  procedure  to  interpolate 
between  adjacent  values  of  the  cross-correlation  function,  and  then  maximize  with  respect  to  a  con¬ 
tinuous  delay.  An  alternative  approach  [9]  is  to  assume  that  the  signal  and  the  noises  are  strictly 
bandlimited,  and  to  use  sinc(  )  functions  to  interpolate.  However,  these  approaches  are  ad-hoc,  they 
can  be  applied  only  under  constant  sampling  rates,  and  they  do  not  necessarily  improve  the  quality 
of  the  delay  estimate. 

In  this  paper  we  develop  computationally  efficient  algorithms  for  the  joint  estimation  of  the 
delay  and  spectral  parameters,  based  on  the  Estimate- Maximize  (EM)  method.  The  proposed  algo¬ 
rithms  are  optimal  in  the  sense  that  they  converge  iteratively  to  a  stationary  point  of  the  likelihood 
function,  where  each  iteration  increases  the  likelihood  of  the  estimated  parameters. 

The  organization  of  the  paper  is  as  follows:  In  section  II  we  consider  the  stationary  case  in 
which  the  differential  delay  is  assumed  to  be  constant  over  the  observation  interval  (stationary 
source),  and  the  signal  and  noise  statistics  are  assumed  to  be  stationary.  In  section  III  we  extend  the 
scope  by  considering  non-stationary  sources,  and  possibly  non-stationary  signals.  We  also  address 
the  problem  of  discrete-time  observations  in  a  natural  way  by  making  an  essential  use  of  the  con¬ 
tinuous-time  signals  propagating  through  the  medium.  In  section  IV  we  present  an  efficient  method 
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to  compute  the  log-likelihood  gradient  (score),  the  Hessian,  and  the  Fisher's  information  matrix 
(FIM)  of  the  underlying  stochastic  system.  These  results  can  be  used  for  efficient  implementation 
of  gradient-search  algorithms,  and  to  assess  the  mean  square  errors  of  the  ML  parameter  estimates. 
Finally,  in  section  V  we  summarize  the  results. 

II.  DELAY  ESTIMATION  -  STATIONARY  CASE 
A.  Problem  Formulation  and  Existing  Results 

Signals  radiated  from  a  stationary  (non-moving)  point  source  and  observed  in  the  presence  of 
additive  noise  by  a  pair  of  spatially  separated  receivers  can  be  mathematically  modeled  as 

Zj(t)  =  s(t)  +  v(t)  (1) 

Zj(t)  =  as(t-r)  +  Vj(t)  (2) 

where  r  is  the  time  difference  of  arrival  (TDOA)  of  the  signal  wavefront  between  the  receivers. 

Suppose  that  s(t),  Vj(t),  and  Vj(t)  are  sample  functions  from  mutually  independent,  wide  sense 
stationary  (WSS),  zero-mean  Gaussian  processes  with  the  spectral  densities  Ps(a;;^),  Py^(cx;;tf),  and 
Pv  respectively.  The  vector  $  represents  possibly  unknown  spectral  parameters  such  as  signal 
bandwidth,  center  frequency,  noise  spectral  levels,  etc. 

Given  continuous-time  observations  of  Zj(t)  and  Tj  _<  t^  Tf ,  we  want  to  find  the  ML  est¬ 
imate  of  r.  Since  the  amplitude  attenuation  a  and  the  spectral  parameters  0  are  also  unknown,  they 
must  be  included  in  the  estimation  process.  We  denote  by 


e  = 


T 

a 

9 


(3) 
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the  vector  unknown  parameters  to  be  estimated. 

Fourier  analyzing  Zj(t)  and  Z2(t), 

(4) 


fTf 

-i(<^n)  ~  ^i 

v/T  JT: 


-jwnt 

(t)  e  dt 


u„  = 


2ir 


where  T  =  Tf  -  Tj.  Assuming  that  the  observation  interval  T  is  long  compared  with  the  correlation 
time  (inverse  bandwidth)  of  the  signal  and  the  noises  (i.e.,  WT/23r  »  1),  the  ZCwj,)  = 

[ZjC^n)  n  =  1,2,...  are  statistically  uncorrelated  zero-mean  and  Gaussian  with  the  covari¬ 

ance  matrix 


P(a^n;f)  =  E(Z(u„)Z%^)} 

Ps  (wn  )  +  Pv^  (wn  ae^""  ’’  Pg  («„  ) 

ae-j^^n^PsCc^nitf)  +  Pv,K;«) 

Therefore,  the  probability  density  of  Z(ujf)  is 


az(<^n)i  = 


j  -Z*(u;n)P-Hwn;e)ZK) 

det(xP(u;n;e)]  ® 


(5) 


(6) 


Lz(C)=  ^logaZ(Wn)l 

n 


and  the  log-likelihood  function  is 


(7) 


rewritten  in  the  form 
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Lz(r)  =  c  +  2Re  {  ^  W(u,„-a,fi)  Z*(u;„)  Z/w„) 
n 

I  Re  {  J^W(u,„ia,»)Z,*(w„)  Z,(w„)  Aw} 

n 

J  Re  {  W(w;a,tf)  z/(w)Zj(w)e-'‘^  dw} 
Jw 


=  c  + 


WT/25r»l 


(I 


where  c  is  a  constant  independent  of  r,  and 


W(w;a,fl)  = 


QiPs(w;tf) 


(12) 


[a^Pv^  (w;tf)+Pv^  (w;tf )]  P^  (w;«)+Pv^  (w;«)Pv^  (w;tf ) 

The  integral  in  (11),  that  is  the  inverse  Fourier  transform  of  W(w;a,tf)Zj*(w)Zj(w),  is  termed  the 
Generalized  Cross  Correlation  (GCC)  [3].  Thus,  if  we  have  had  exact  prior  knowledge  of  a  and 
the  ML  estimate  of  the  delay  parameter  would  be  obtained  by  plotting  the  real  part  of  the  GCC  as 
a  function  of  delay,  and  search  for  its  peak,  that  is 


MaxRe{  W(w;a,tf)Zj*(w)Zj(w)e'^^  dw}  ====>  r 
r  Juj 


(13) 


As  pointed  out  by  Knapp  and  Carter  [3],  the  ML  estimator  is  identical  to  the  estimation  scheme 
developed  by  Hannan  and  Thomson  [10]  (see  also  Hamon  and  Hannan  [11]).  Other  delay  estimation 
techniques  ([12]-[13])  have  the  same  format  as  (13),  where  the  weighting  function  W(w;a,tf)  is  chosen 
to  optimize  a  selected  criterion  (e.g.,  signal  to  noise  ratio,  detection  index,  etc.).  These  methods  are 
expected  to  outperform  the  conventional  cross-correlation  instrumentation  (W(w)=l)  by  taking  full 
advantage  of  the  spectral  details  of  the  signal  and  the  noises. 

In  practice,  we  do  not  have  prior  knowledge  of  a,  and  the  spectral  description  of  the  signal  and 
the  noises  is  incomplete.  Therefore,  it  has  been  suggested  first  to  estimate  the  selected  weighting 
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function,  aru^  ihen  use  it  in  (13)  (e.g.,  see  (3],  (9],  [11],  [14]).  However,  this  approach  is  only  sub- 
optimal,  and  its  inherent  performance  critically  depends  on  the  method  employed  for  spectral  esti¬ 
mation. 

The  Cramer-Rao  lower  bound  (CRLB)  on  the  error  variance  of  any  unbiased  estimator  of  the 
delay  is  given  by  (e.g.,  see  [15],  Chapter  2) 


VAR(f)>[J-i(e)]i,  (14) 

where  J(^)  is  the  Fisher's  information  matrix  (FIM)  associated  with  ^  the  unknown  parameters  in 
the  problem.  In  [4]  it  has  been  shown  that  the  FIM  processes  the  following  block-diagonal  form 


J(€) 


h 

0 


where 


=  E{[dLz(0/dT?) 

^  Jo  [Qt^Pv j (w;tf )+Pv j (w;«)]Ps (w;tf)+Pv^ (w;«)Pv^ (<^;«) 


(15) 


(16) 


Therefore, 

VAR(f)>l/J,  (17) 

Invoking  the  asymptotic  efficiency  and  lack  of  bias  of  the  ML  estimator,  the  lower  bound  in 
(17)  is  attainable.  However,  this  is  true  only  if  the  ML  estimation  of  ^  the  unknown  parameters 
(that  is,  the  solution  to  (10))  is  performed.  If  we  first  estimate  the  spectral  weighting  function  and 
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then  substitute  it  in  (13),  we  may  obtain  a  biased  estimate  whose  error  variance  may  be  significantly 
larger  than  the  lower  bound. 

In  the  next  sub-section,  we  develop  a  computationally  efficient  iterative  algorithm  for  finding 
the  joint  ML  estimate  of  the  delay  and  spectral  parameters  without  the  need  to  solve  the  compli¬ 
cated  multi-parameter  optimation  indicated  in  (10). 

B.  Development  of  the  Algorithm 

In  this  section  we  apply  the  Estimate-Maximize  (EM)  algorithm  [16]  to  the  two-channel  time- 
delay  estimation  problem.  The  EM  algorithm  is  an  iterative  method  for  finding  the  ML  parameter 
estimates  given  incomplete  data.  It  works  with  a  complete  data  specification,  and  iterates  between 
estimating  the  log-likelihood  of  the  complete  data  using  the  observed  (incomplete)  data  and  the  cur¬ 
rent  parameter  estimates,  and  maximizing  the  estimated  log-likelihood  function  to  obtain  the  new 
parameter  estimates. 

If  we  have  had  the  observation  of  the  source  signal  s(t)  in  addition  to  z(t)  =  I^Zj(t)  Z2(t)j'^,  the 
estimation  problem  would  be  relatively  easy.  Therefore,  we  choose  as  the  complete  data  y(t)  the 
vector 


y(t)  = 


Zi(t) 

z(t) 

Z2(t) 

s(t) 

s(t) 

Ti  <  t  <  Tf 


(18) 


Fourier  analyzing  y(t). 


Z(W  ) 

X(c*^n)  = 

= 

—  n 

S«.„) 

S(a;n) 

(19) 
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where  Zi(Wj,)  ;  =  1,2  are  defined  by  (4),  and 


fTf 

S(a;n)=4l  ^ 

VT  JTj 


s(t)e  "  dt 


CJ„  = 


2t 

T 


(20) 


We  note  that 


ZiC^n)  =  S(a;n)  +  Vj(wn)  (21) 

^2^^n )  =  ®  c  ^‘*’n )  ■*■  ^2(^n)  (^2) 

where  Vi(cjn)  i  =  1,2  are  the  Fourier  coefficients  of  vj(t)  i  =  1,2.  We  further  note  that  given  S(wn), 
Zj(wn)  and  Zj(«n)  are  conditionally  independent.  Therefore,  under  the  large  WT  assunaption,  the 
log-likelihood  of  the  complete  data  is 


Ly(«)  =  ^  logqY(wn)] 


=  ^  {logfIS(a;n)]  +  logflZ^Cwn  VSfwn )]  +  logflZ^C^n  )/S(c.;n 


)]) 


(23) 


where 


lognS(wn)]  =  -  logirPs(u;n;tf)  -  |S(wn)l  VPs(wn;tf) 


(24) 


(25 
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logflZi(ci;n)/S(wn)]  =  -  logTPy^  (<*;„;«)  - 


|Z,(a;n)-S(a;n)| 


2 


logfIZj(wn)/S(wn)]  =  -  logiPy^Cwnl^) 


-JWnr 

|Z,(a;n)-ae  “  S(w„P 


Pv,(‘^n;*) 


(26) 


To  simplify  the  exposition,  we  suppose  that  the  noise  spectra  Py.(w;tf)  =  Pyj(a»)  i  =  1,2  are  per¬ 
fectly  known.  The  vector  i  affects  only  the  spectral  density  of  the  signal.  In  that  case,  substituting 
(24)  -  (26)  into  (23)  and  following  straightforward  algebraic  manipulations,  we  obtain 


i-Y(f)  “  c  +  2aRe| 


^e--'‘"""S(u;n)zJ<.„)/Py^(a,„) 


(27) 


where  c  is  a  constant  independent  of 


We  are  now  ready  to  apply  the  EM  algorithm.  Denote  by 


p(«) 

'(0 

X 

:(0 


(28) 


the  current  estimate  of  f  after  t  iterations  of  the  algorithm.  Then,  the  next  iteration  cycle  is  speci¬ 
fied  in  two  steps  as  follows: 


-  13  - 


E-step:  Compute 


{LY(e)/Z(w,).ZK),...) 


(29) 


M-step: 


MaxCX^iW)  |(^+J) 


(30) 


where  E^^^^  ^Wj),...)  denotes  the  conditional  expectation  given  the  observed  data 

Z(Wj),...,  evaluated  using  the  current  parameter  estimate 


Under  the  continuity  of  Q(.^,C)  both  and  (see  [17]),  the  algorithm  converges  to  a  station¬ 
ary  point  of  L2(f),  the  log- likelihood  of  the  observed  data,  where  each  iteration  cycle  increases  the 
likelihood  of  the  estimated  parameters.  Of  course,  as  in  all  "hill  climbing"  algorithms,  the  conver¬ 
gence  point  may  not  be  the  global  maximum  of  the  objective  function,  and  thus  several  starting 
points  (or,  alternatively,  a  grid  search  to  roughly  locate  the  global  maximum)  may  be  needed. 


Now,  substituting  (27)  into  (29) 


Q(f,|(^))  =  c  +  G(^)  (r,o)  +  h(0(#) 


(31) 


where 


GW(r,a)  =  2aRe  (^  e'-'"n’’§(wn)Wz5wn)/Pv^(u;n))  -  a*  ^  Is^^2W/Pv^(Wn)  (32) 


and 


h(0(#)  .  - 


(33) 


where 
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{S(wn)/^w,).  ^w,)....} 


|S(Wn)|*(^)  =  (|S(a,„)|VZ(Wi).Z(a;,)....) 

=  l§(wn/^V  +  {S(a;n)/^w,).  Z(a;,),...)  (3! 

We  now  recall  the  following  well-known  theorem  (e.g.,  [18],  Chap.  2):  If  S  and  Zj,  Zj,,..Zj^ 
are  jointly  Gaussian  zero-mean  random  variables,  and  Z^,  Zj,...Zj^  are  statistically  independent. 


IV 

E{S/Z„Z„...Z^).  ^E{S/Zk} 


Therefore, 


VAR{S/Z,,Zj,...Zj^ )  =  VAR{S/Zk)  -  (N-1)VAR{S) 

k-1 


E^(^)  {S(a;n)/Z("i)>ZK). •••) 

=  E^^^j  {S(o;„)/^u;„))+  ^  E^(^)  {S(u;„)/ZK)) 


“^|(0  (S(a;n)/Z(ai„)) 


VAR^^^j  {S(Wn)/Z(a;i),Z(ai,),...) 


=  VAR^(^)  {S(a;n)/Z(a;n))+  ^  VAR^^^^  {S(«n )/^‘^k »  ’  (  X! (0 


=  VAR^^^j  {S(wn)/^‘^n)) 


where  in  the  transition  to  the  third  line  of  (38)  and  (39),  we  invoked  the  statistical  independence 
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between  S(a)Q)  and  for  k^n.  Hence,  the  conditional  expectations  in  (34)  and  (35)  reduce  to: 

§(u„)(0  -  {S(wn)/Z(u;n)}  (40) 

“  E|(/)  (IS(‘^n)IVZ(Wn)} 

=  |§(w„)(Ol*  +  {S(wn)/^a;n))  (41) 

Since  S(h),|)  and  Z(ujy)  are  jointly  Gaussian,  then  using  well-known  results  (e.g.,  [18],  Chap.  2) 

^|(/)  (S(^n>/l(‘^n)} - 

=  E^(^)  {S(u;„)Z%n)}  )Z%n )) j  'zK)  (42) 

{S(a;n)/Z(u;„))  = 

=  VAR^^^^  {S(a;n)) -E|(^)  (S(«n)Z%n))K(<)  ®"n)Z%n))l  'E{^a;„)S*(u;n))  (43) 


In  view  of  (21),  (22),  and  (5) 


E^(^)  {S(u;„)Z%n))  -  Ps(a;„;^ (0)[i| ]  (44) 

E^(^)  {Z(wn)zV„))-P(wn;^^^^)  = 

Substituting  (44)  and  (45)  into  (42)  and  (43)  and  following  straightforward  matrix  manipulations. 
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^|(/)  (S(wn)/ZK))  = 


PsK;^(0) 


:,(a;n)+a(<)e^  "  p  y,;.  \Z,(a>n 


Pv,K) 


Pv,("n) 


PsK;^WHPv  (a»n) 


(46) 


{S(wn)/Z(a;n)}  = 


Pv^(a;„)Ps(a;„;|(0) 


Pv  (Wr,  ) 

Pvj(‘^n) 


Ps(a;n;^(^))+Pv  (wn) 


Now,  in  view  of  (31),  the  maximization  in  (30)  is  decomposed  into 


(47) 


Max  G^^)(r,a)  =*===>  ,  q(^+0 


T,a 


(48) 


MaxH(^)(«)  — ==>  #(^+1) 


(49) 


We  observe  that  G(^)(r,a)  (Eq.(32))  is  a  quadratic  function  of  a,  and  the  a  that  maximizes 
G^^)(r,a)  (for  any  pre-specified  r)  is  given  by 


Rd 


a(r)  = 


^  e"-'""'^  §(w„  )(0z*(wn  )/Pv,("n ) 


^|SK)l*^^VPv^(u;n] 


(50) 


Therefore,  the  two-parameter  maximization  in  (48)  can  be  carried  out  in  two  steps  as  follows: 


MaxGW(r,a(r))  ====> 


(51) 
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S«+0  =  a(?(^+‘)) 


where  we  note  that 


G(^)(r,a(r))  = 


ReJ 


^  §  (Wn  )(^)  z/(a;„  )/Pv^  (a;„  ] 


Incorporating  all  the  above  considerations,  the  algorithm  assumes  the  form: 


E-step:  Compute 


1+a 


Pv  (Wn) 
Pvj(^n) 


PsK;^^^^)+Pv,K) 


\S(wnr^f-)  =  |§K)W|^ 


Pv,K)PsK:^(^^) 


Pv  (Wn) 


PvjK) 


PsKi^^^VPv,^) 


M-step: 


Max 

T 


Re.  §((.;„)«)  zJwn)/Pv,K) 


_ ^===>f(<+l) 


(52) 


(53) 


(54) 


(55) 


(56) 
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U^+l)  = 


I' 


Re-^  ^  e 
n 


§(c^n)(0  zJwnVPv  (wn) 


(57) 


^|S(w„)|*(^)/Pv^(a;„] 


Min  ^ 
n 


^(/+1) 


(58) 


Perhaps  the  most  striking  feature  of  the  algorithm  is  that  it  decomposes  the  spectral  parameter 
optimization  from  the  delay  optimization,  resulting  in  a  considerable  simplification  in  the  computa¬ 
tions  involved.  Since  the  algorithm  is  based  on  the  EM  method,  it  must  converge  to  the  solution  of 
(10)  or,  at  least,  to  a  stationary  point  of  L2(()- 

We  note  that  Eqs.(56)  and  (57)  are  the  solution  to  the  ML  problem  of  estimating  r  and  q  assum¬ 
ing  that  the  source  signal  is  known  to  the  observer,  where  S(wq)  and  |S(wn)|*  are  substituted  by  their 
current  estimates  §(<»;„ )(^)  and  |S((«^)|*^^),  respectively.  Similarly,  the  optimization  in  (58)  yields  the 
solution  to  the  ML  problem  of  estimating  the  spectral  parameters,  where  the  sufficient  statistic 
|S(b)j|)|^  is  substituted  by  its  current  estimate.  Thus,  the  algorithm  iterates  back  and  forth,  using  the 
current  parameter  estimates  to  improve  the  signal  estimate  and  thus  to  improve  the  next  parameter 
estimate.  The  algorithm  is  illustrated  in  Figure  1.  We  note  that  §(ci^n  )(0  and  are,  in 

fact,  the  outputs  of  the  (non-causal)  Wiener  filter  applied  to  the  two-channel  data. 

As  indicated  by  (15),  the  quality  of  the  delay  estimate  is  unaffected  by  lack  of  exact  knowledge 
of  the  signal  spectral  parameters.  Therefore,  if  we  are  essentially  interested  in  the  delay  estimation 
and  we  are  close  to  the  point  of  convergence,  we  may  consider  performing  a  partial  M-step,  leaving 
the  spectral  estimates  at  their  current  value  and  updating  only  the  estimates  of  r  and  a.  By  that  we 
may  save  in  computations  with  only  insignificant  effect  on  the  rate  of  convergence  of  the  algorithm 
and  the  quality  of  the  resulting  delay  estimate. 
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An  initial  estimate  may  be  obtained  by  first  estimating  the  spectral  parameters  of  the  signal  at 
one  receiver  output,  and  then  use  these  estimates  to  construct  the  initial  estimate  of  the  amplitude 
attenuation  and  the  delay  parameters. 

C.  Simulation  Results 

Consider  the  following  example  taken  from  (14]:  The  observed  data  are  generated  by  sampling 
( 1  )-(2)  at  a  constant  rate  that  is  sufficiently  high  to  preserve  the  spectral  structure  of  the  continuous 
waveforms.  The  additive  noises  are  assumed  to  be  spectrally  white  with 

Pv^(<u)  =  Pv^(w)  =  a*  -  x/At  <  (J  <  x/At 

and  the  source  signal  is  modeled  as  an  all-pole  process  of  order  3,  with  the  spectral  density 

Ps(")  *  Tf*  |l+tfiexpjwAt+tf,expj2wAt4^3expj3<i;Atl"*  -  x/At  <  w  <  x/At 

where  At  =  T/N  is  the  sampling  period. 

Suppose  that  7*  and  a*  are  known  constants.  The  unknown  parameters  are  {=(r,a,S)^  where 

The  algorithm  has  been  tested  using  the  following  set  of  numerical  values 

N  =  1024 

7*  =  1 

ff*  =  8.97995  (SNR=1) 

r/At  =  10.5 

a  =  1 

-  1.77,  1.593,  fij^.7047 


1 
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In  Tables  1  and  2  of  Figure  2,  we  have  tabulated  the  outcomes  of  the  algorithm  for  two  initial 
guesses  of  the  delay  r(o)/At  =  10.0,  11.0.  The  initial  guess  of  the  amplitude  and  spectral  parame¬ 
ters  is:  *  0.7,  ^  =  -1.8,  $  =  1.7,  ^  =  -0.8.  For  reference,  we  have  also  calculated 

the  log-likelihood  (£<l-(8)),  normalized  by  the  number  N  of  data  points,  along  the  iterations. 

The  CRLB  (Eq.  (16))  on  the  rms  error  of  the  delay  estimate  is  o(f /At)  =  0.07.  We  see  that  after 
few  iterations,  the  algorithm  converges  within  the  CRLB  to  the  ML  estimate  of  the  delay  and  spec¬ 
tral  parameters. 

In  Tables  3  and  4,  we  have  tabulated  the  results  using  the  same  r(o)/At  =  10.0,  11.0,  and  the 
following  choice  of  initial  estimates:  =  0.7,  $  =  -2.2,  ?  =  2.2,  ^  =  -1.5.  As  we  see, 

after  few  iterations,  the  algorithm  converges  within  the  CRLB  to  the  ML  delay  estimate,  although 
the  amplitude  and  spectral  estimates  exhibit  large  errors  (perhaps  due  to  slow  converge  rate,  or  the 
convergence  to  a  stationary  point  of  the  likelihood  function). 

D.  Generalization  to  Multi-Receivers 

Consider  the  generalization  of  the  proposed  algorithm  to  the  estimation  of  vector  delays  obta¬ 
ined  with  an  array  of  M  spatially  distributed  r«:eivers.  The  observed  signals  are 

Zi(t)  =  aj  s(t-ri)  +  vj(t)  i=l,...(M-l)  (59) 

zm(0  =  +  '’m(*)  (60) 

where  rj's  are  the  TDOA's  relative  to  the  M'th  receiver. 

We  suppose  that  s(t),  Vj(t),...vj^(t)  are  sample  functions  from  mutually  independent,  WSS  zero- 
mean  Gaussian  processes  with  the  spectral  densisties  Ps(w;tf),  Pv^(a;),...  (w),  respectively.  To 

simplify  the  exposition,  we  have  assumed  that  Pvj(w)  i=l,...M  are  perfectly  known. 
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Given  continuous-time  observation  of  2(tHZi(t)»— Tj  <  t  <  Tf ,  we  want  to  find  the  ML 
estimate  of  the  vector  delays  tKi’i.  -t'm-I  jointly  with  the  amplitude  attenuations  a=(aj,...Q!ji^_  j  )T 
and  the  unknown  signal  spectral  parameters  #.  Denote  by 


f  = 


(61) 


the  vector  unknown  parameters  to  be  estimated. 

Even  if  $  and  a  are  known,  the  search  for  the  ML  estimation  of  r  is  a  relatively  complicated 
maximization  in  (M-1)  unknowns.  If  our  ultimate  objective  is  to  estimate  source  location  parame¬ 
ters  (i.e.,  bearing  and  range),  we  can  either  estimate  the  vector  delays  and  convert  it  to  bearing  and 
range  estimate,  or  to  maximize  the  log-likelihood  function  directly  with  respect  to  bearing  and 
range,  see  [19120]. 

Consider  the  application  of  the  EM  algorithm.  In  analog  with  the  two- receiver  case,  let  the 
complete  data  be 


Zi(t) 

• 

z(t) 

y(t)  = 

zmW 

■ 

s(t) 

s(t) 

(62) 


In  the  frequency  domain. 
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*iK) 


XK)=  ZmK)  '[S(a;n) 


where 


S(wn) 


Zj(wn)  =  Qj  e  ‘  S(<Jn)  +  Vj(wn) 

ZMK)“S(«n)+  V^iK) 


For  large  WT,  the  are  uncorrelated  and  thus,  in  the  Gaussian  case,  statistically  indepen¬ 

dent.  Therefore,  the  log-likelihood  of  the  complete  data  is 


LyCe)  *  ^  log  nS(Wn)l  +  X!  Z!  nZi(«n)/S(«n) 


n  i®l 


where  log  ftS(Wn  )J  given  by  (24), 


■J^nH 

|Zi(Wn)-«ie  S(wn)l* 


log  nZi(u;n)/S(u;„)]  =  -  log  x 


log  nzM("„)/s(c.-„)i  •  -  log » Pv^^(u„)  - 
Following  the  derivation  in  Section  Il-B,  we  obtain  the  following  algorithm: 
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E-Step:  Compute 


M-1  p  (0  p  fjo  ^ 


i-1  ” 

t  1 


|S(«n)l*<^^  =  l§^^>K)l*  + 


tvi- 1 

Z“i' 


)2 

Pyj  (‘‘^n ) 


Ps(a»„;#(0)  +  P^^Cwn) 


M-Step: 


Delay  and  amplitude  optimizatioa' 


For  1=1, ...M-1 


Mm  .  Re  ^e  ‘  §^^^(w„)  Zj (<*>„ VPyj («„ )  ■  =««==>  f  jC^+O 


I 


Z- jWn  T  :  ^ )  ♦ 

c  "  §^^)K)ZjK)/Pvj(a;n) 
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Spectral  optimization: 


Min 

e 


n 


log  Ps(wn;«)  + 


Ps(Wn:^) 


=====>  (<+l) 


(as  in  (58)). 

The  algorithm  is  illustrated  in  Figure  3.  Once  again  we  see  that  the  spatial  parameter 
optimization  is  decoupled  from  the  spectral  parameter  optimization.  However,  the  most 
striking  feature  of  the  algorithm  is  that  it  decomposes  the  complicated  multi-dimensional 
optimization  associated  with  the  (M-1)  pairs  of  delay  and  amplitude  parameters  into  the 
(M-l)  optimizations  with  respect  to  each  parameter  pair  separately.  We  therefore  suggest  to 
substitute  the  direct  ML  optimization  that  requires  either  the  joint  maximization  of  (M-l) 
weighted  cross-correlators,  or  the  separate  maximization  of  all  M(M-l)/2  combinations  of 
cross-correlators  (see  [21]),  with  the  iterative  maximization  that  only  employs  (M-l)  cross¬ 
correlators  in  parallel. 


(73) 
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E.  Time- Varying  Etelays 

If  there  is  relative  motion  between  source  and  receivers  the  observed  differential  delay  is  time- 
varying.  Suppose  there  is  only  a  small  change  in  array-source  geometry  during  the  observation  in¬ 
terval  so  that  the  differential  delay  is  essentially  linearly  time-varying,  that  is 

f<t)  =  r  +  r  •  t  (74) 

where  r  is  the  differential  delay  at  t=0,  and  r  is  the  differential  Doppler  time  compression. 

The  observed  signals  are 


Zj(t)  =  s(t)  +  Vj(t) 

(75) 

Zj(t)  =  a  s(t-T(t))  +  Vj(t) 

(76) 

where,  as  before,  s(t)  is  modeled  as  a  sample  function  from  a  WSS  zero-mean  Gaussian  process  with 
the  spectral  density  PjCw;^),  Vj(t)  and  v^Ct)  are  assumed  to  be  indeF>endent  spectrally  white  with 
known  power  level  of  N„/2. 

We  note  that  each  receiver  output  is  stationary,  but  they  are  not  jointly  stationary.  This  is  why 
the  delay  estimation  problem  in  this  case  is  significantly  more  complicated  (see  Knapp  and  Carter 
[5]). 

We  want  to  extend  our  algorithm  to  the  estimation  of  the  vector  parameters 

T 

r 
a 

$ 
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The  complete  data  are  still  given  by 


y(t)  “ 


Zl(t) 

z,(t) 

s(t) 


Tj  <  t  <  Tf 


(78) 


Given  s(t),  Zj(t)  and  Zj(t)  are  statistically  independent.  Therefore,  the  log- likelihood  of  y(t) 
Ti<t<Tf  is  given  by 


Lyff)  =  log  f[s(t)  Ti<t<Tf ]  +  log  flZj(t)  Tj<t<Tf/s(t)  Tj<t<Tf] 

+  log  flzj(t)  Ti<t<Tf/s(t)  Ti<t<Tf  ]  (79) 

where 


log  f[Zj(t)  Ti<t<Tf/s(t)  Ti<t<Tf  j  =  Cj 


fTf 

if  [z,(t)-s(t)]»  dt 
•^0  Jtj 


(80) 


iog  az,(t)  Ti<t<Tf/s(t)  Tj<t<Tf  1 

=  c,  -  [z,(t)  -  s(t-r-r  t)]*  dt  (81) 

^0  Jj. 


where  c^  and  c,  are  independent  of  and  log  fts(t),  Tj<t<Tf)  is  the  log-likelihood  of  s(t)  TjstsTf, 
which  is  given,  under  large  WT  assumption  by 


log  fls(t),  Ti<t<Tf] 


Jog  Jr  Ps(u;n:^)  +  iS(wn)IVPs(wn;«)) 


(82) 


The  E-step  of  the  algorithm  requires  the  conditional  expectation  of  LY(f)  given  the  observed 
Zj(t)  and  Zj(t)  at  the  current  parameter  estimate  Since  z^(t)  and  z,(t)  are  jointly  non-stationary, 
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this  computation  seems  complicated  at  first  glance.  However,  since 
given  to  us,  we  could  take  instead  the  conditional  expectation  with  respect  to  z^(t)  and 

- t]  (83) 

1-r  (0 

V  y 

Since  Zj(t)  and  z^^)(t)  can  be  obtained  from  one  another  by  a  time-scale  (reversible)  operation, 
then  ignoring  end  effects  it  should  make  no  difference. 

Now,  at  the  current  estimate  Zj(t)  and  z^^Ht)  are  jointly  stationary  and  the  required  condi¬ 
tional  expection  is  easily  carried  out. 

We  observe  that  the  expression  in  (80)  is  independent  of  (,  the  expression  in  (81)  depends  only 
on  a,r  and  r,  and  the  expression  in  (82)  depends  only  on  Therefore,  the  algorithm  takes  the 
form; 
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M-step: 


Delay,  Doppler  and  amplitude  optimization 


Max 

r,r 


■  2  I  Zj(t)  s  W(t-r-r-t)dt 
Jtj 


(88) 


z,(t)  (t-r(^+l)-r  (^+l).t)  dt 

Ti 


Jtj 


(89) 


Spectral  optimization 


Min 

9 


log  Ps(wni^)  + 


n 


PsK;^) 


==>^(<+l) 


(90) 


where  Z^^)(wn)  is  the  Fourier  coefficient  of  z^^^(t)  at  frequency  and  ~H-)  denotes  the  inverse 
Fourier  transform  operation. 

The  algorithm  is  illustrated  in  Figure  4.  We  note  that  s(^)(t)  is  the  time-domain  output  of  the 
Wiener  filter. 

We  observe  that  the  £-step  of  the  algorithm  is  essentially  unaffected  by  the  non -stationary  in¬ 
troduced  by  the  time-varying  delay. 

The  spectral  parameter  optimization  is  identical  to  the  stationary  case  and  it  is  carried  out  sep¬ 


arately. 
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The  joint  optimization  of  the  delay  and  Doppler  basically  consists  of  correlating  Zj(t)  with  tiiiie- 
scaled  and  shifted  version  of  the  Wiener  filter  output  s  (^)(t). 

We  point  out  once  again  that  the  algorithm  is  guaranteed  to  converge  monotonically  to  a  sta¬ 
tionary  point  of  the  log-likelihood  function. 

The  extension  to  the  M-receiver  case  is  straightforward.  The  algorithm  decomposes  the  compli¬ 
cated  multi-dimensional  optimization  associated  with  the  (M-1)  triples  of  delay,  Doppler  and  ampli¬ 
tude  parameters  into  (M- 1 )  optimizations  with  respect  to  each  parameter  triple  separately.  Thus,  the 
complexity  of  the  algorithm  is  only  linearly  affected  by  the  number  of  unknown  delay  and  Doppler 
parameters. 


III.  DELAY  ESTIMATION  -  THE  NON-ST ATIONARY  CASE. 


In  this  section,  we  extend  the  scope  by  postulating  a  moving  source  travelling  along  some  unk¬ 
nown  trajectory  and  radiating  a  possibly  non-stationary  noise-like  stochastic  signal. 

A.  Problem  Formulation 

The  continuous-time  waveforms  observed  at  the  receiver  outputs  are 


z^(t)  =  s(t)  +  Vj(t) 

(91) 

Zj(t)  =  as(t-r(t)]  +  Vj(t) 

(92) 

Let  the  signal  s(t)  be  modeled  using  the  following  continuous-time  linear  dynamic  stochastic 


state  equation 
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x(t)  =  F(t;tf)x(t)  +  G(t)w(t)  -oo<t<oo  (93) 

s(t)  =  h’rx(t)  (94) 

where  x(t)  is  the  qxl  state  vector,  w(t)  is  the  rxl  vector  independent  normalized  white  Gaussian 
noise,  and  B  is  the  pxl  vector  unknown  system  (spectral)  parameters.  This  model  covers  a  wide 
range  of  signals  including  continuous-time  all-pole  and  zero-pole  processes,  transient  phenomena, 
etc. 

As  suggested  in  [4]  [22],  the  differential  time- varying  delay  T(t)  is  parametrized  using  Taylor 
series  expansion 


T(t)  =  To  +  Tj-t  +  Tjt*  +  ...  + 

which  identifies  the  coefficients  rj  as 


"j 


1  dirfO 

j!  dti 


t=0 


(95) 


(96) 


The  Tj's  have  direct  physical  meaning;  is  the  differential  delay  at  t=o,  is  the  differential 
Doppler  time  compression  at  t=0.  Higher  order  coefficients  correspond  to  successive  Doppler  deri¬ 
vatives,  all  evaluated  at  t=0.  If  the  observation  time  is  short,  so  that  r(t)  varies  essentially  linearly, 
the  differential  Doppler  is  constant  and  the  series  terminates  after  two  terms.  By  carrying  more 
terms  in  the  series  one  allows  either  longer  observation  periods  or  more  complex  source  manuevers. 

For  practical  reasons,  the  actual  data  are  generated  by  sampling  the  time  functions  in  (91)  and 
(92).  The  resulting  discrete-time  signals  satisfy  the  following  model 


hk  =  s(tk)  +  Vjk 

(97) 

Z2k  =  <»(tk  -  T(tk)]  +V,k 

(98) 

We  suppose  that  and  are  statistically  independent  zero-mean  white  Gaussian  sequences 


with  unknown  spectral  levels  of  <7^*  and  Cj*.  respectively. 
Given  the  discrete  time  observations 


*k  = 


^ik 

Zjk 


k  =  1,2,...N 


we  want  to  find  the  ML  estimate  of  the  vector  parameters 


(99) 


(100) 


where  r  =  (r^.r,...)^  are  the  delay  parameters. 

The  direct  ML  problem  is  significantly  more  complicated  because  of  the  source  motion,  the 
non-stationary  of  the  signal,  and  the  continuous-discrete  nature  of  the  problem. 


B.  Development  of  the  Algorithm 

We  want  to  apply  the  EM  algorithm  to  solve  the  problem.  Motivated  by  the  considerations  in 
[23],  we  choose  as  the  complete  data  the  continuous-time  state  x(t)  Tj  <  t  <  Tf  jointly  with  the 
observations  Zj,  .  The  log-likelihood  of  the  complete  data  is  therefore  given  by 
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LY(e)  =  logf(Zjk/x(t)  Tj  <  t  <  Tf ) 

k=l 

N 

+  ^  logf(Zjk/x(t)  Tj<t<Tf)  (101) 

k»l 

where 


logf(Zjk/x(t)  Tj  <  t  <  Tf)  = 

=  -i  log(2xrf,2)  -  -  s(tk)]*  (102) 

logf(Zjk/x(t)  Tj  <  t  <  Tf)  =  -  I  log(2xaj*)  -  ^  [Zjk  -  c«s(tk  -  T(tk)]]2  (103) 

and  is  the  log-likelihood  of  the  continuous  state  x(t)  Tj  <  t  <  Tf  given  by  (e.g.,  see  [24][25]) 


L_^(«)  =  c  + 


1 

2 


/•Tf  ^ 

x(t)TF  (t;tf)T  [G(t)G{t)T]  dx(t) 

JTi 

/•ff  ^ 

j  Tr  {F(t;tf)T  [G(t)G<t)T]  F(t;«)x(t)  xJi)}dt 


(104) 


where  c  is  a  constant  independent  of  $.  The  symbols  and  Tr(  )  denote  the  pseudo-inverse  and 
the  trace  operations,  respectively.  Substituting  (102)-(104)  into  (101)  and  taking  the  conditional 
expectation  given  Zj,Z2,...Z{sj  at  a  parameter  value  ^(^), 


Q(f;^/))  A  {LY(e)/Zi,ZN) 

=  Cl  +  G(^)(r,a,<Ti*,cTj*)  +  HW(tf) 


(105) 


where  c^  is  a  constant  independent  of 


G(^)(r,Q(,aj2,aj*)  = 
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N 

=  -  ^  Ioga,2  -  ^  ^  [Zjk*  -  2zik  s  W(tk)  +  s2(tk)(^)] 

^  k=I 
N 

-  y  log  (7j*  -  2^  ^  [Zjk*  -  2azji(  s  (^)(tk;r)  +  a2s2(tij;r)(0] 
*  k=l 


where  we  have  denoted  for  convenience  s(t;r)  =  s[t-T(t)]. 

In  view  of  (29),  (30)  and  <105),  the  algorithm  assumes  the  form; 


(106) 


(107) 


E-Step:  Compute 


G(0(r,a.a,2,a,2),  h(^)(^) 


(108) 


M-Step: 


Max  G(^)(r,Q,<7j*,aj^)  ======> 


f(f+l)  S(f+1) 


(109) 


Max 

$ 


(0) 


:> 


(110) 


As  in  the  stationary  case,  we  observe  that  the  spectral  parameter  optimization  (Eq.(llO))  is  dec¬ 
oupled  from  the  optimization  with  respect  to  the  other  parameters.  Now,  for  a  pre-specified  r,  the 
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a,  a*  and  a,*  that  maximize  g(^)()  can  easily  be  found  by  differentiation.  Therefore,  following 
straightforward  algebraic  manipulations,  we  obtain  the  following  algorithm: 

E-Step:  Compute 


x(^)(t)  =  {x(t)/2j,ZN) 


(111) 


x(t)x(t) 


,t(0  _ 


=  E 


lit) 


[x(t)x(t)T'/Zj,...ZN] 


(112) 


{J^  x(t)TF(t;tf)T[G(t)G(t)T]#dx(t)/z,,...ZN ) 


(113) 


M-Step: 


Max 

r 


N 

Zjk  s  (^)(tij;r)]2 


k=l 


A 

=s====>  r 


(«+i) 


(114) 


S(«+0  = 


X 

k=l 


^s*(tk;f(«+‘))(0 

k=l 


(115) 


N 

;^2(/+l)  =  i  ^  (z,k*  -  2z,k  s  (0(tk)  +  s^tk)(<)] 


k=l 


(116) 
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N 

-  jL  ^  s*«)(tk;?('*l)),  (1,7) 

k=l 

Max  h(^)  («)======>  I  (^+  * )  (118) 

$ 

where  we  note  that  the  conditional  expectations  in  (112)  and  (113)  are  needed  for  the  computation 
of  H(^)(fl)  (tq.(107)).  Since  s(t)  =  hTx(t)  (Eq.  (96)),  then 

s(0(t)  =  hTJW(t)  (119) 

s2(t)(^)  =  hT  x(t)x(t)T^^^  h  (120) 

Using  these  relations,  the  conditional  expectations  in  (111)  and  (112)  are  sufficient  to  compute 
the  terms  in  (114)-(117).  The  algorithm  is  illustrated  in  Figure  5. 

Since  the  algorithm  is  based  on  the  EM  method,  it  must  converge  to  a  stationary  point  of 
Lz(f),  the  log- likelihood  of  the  observed  data,  where  each  iteration  cycle  increases  the  likelihood  of 
the  estimated  parameters. 

We  note  that  the  delay  estimates  are  not  subject  to  the  sampling  period,  since  the  maximization 
in  (114)  is  carried  out  in  the  underlying  continuous-time  domain  of  the  propagating  signals.  Hence, 
this  scheme  is  particular  useful  when  we  want  to  estimate  the  delay  with  resolution  that  is  a  fraction 
of  the  sampling  interval. 

If  the  stochastic  system  generating  the  signal  s(t)  is  known  (i.e.,  9  is  known)  we  simply  eliminate 
Eqs.  (113)  and  (118)  from  the  algorithm,  and  use  this  a-priori  information  in  the  computation  of 
(111)  and  (112).  If  the  noise  power  levels  and  are  known  a-priori,  we  simply  eliminate  Eqs. 
(116)  and  (117). 
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The  computation  of  xW(t)  and  x(t)x(t)''^^^  Tj<t<Tf  (Eqs.  (Ill)  and  (112),  respectively)  can  be 
carried  out  by  first  calculating  the  discrete-time  estimates  of  x(t)  and  x(t)x(t)^  using  the  discrete 
Kalman  smoothing  equations,  and  then  exploiting  the  Markovian  nature  of  the  underlying  stochastic 
state  equation  to  interpolate  between  adjacent  time  instances.  Details  of  this  procedure  will  be  pre¬ 
sented  in  the  sequel. 

The  conditional  expectation  of  the  stochastic  integral  in  (113)  can  be  expressed  in  some  impor¬ 
tant  cases  in  terms  of  the  conditional  expectation  in  (112),  e.g.,  in  the  case  of  all-pole  signals  to  be 
described  next.  Otherwise,  it  can  be  approximated  via  discretization  (see  the  considerations  in  [24], 
and  the  computationally  efficient  method  developed  in  [26]). 


All-Pole  Signals 

Let  the  signal  s(t)  be  modeled  as  the  output  of  an  all-pole  filter  driven  by  a  white  Gaussian 
noise.  The  equivalent  state  model  is  given  by  (93)-(94),  where 


F(t;«)  = 


0 

0, 


0 

1 


(i2i) 


G(t)'F  =  [0  0 - OV^  (122) 

and 

hT  =  [1  0 - 0]  (123) 

In  this  setting,  the  signal  s(t)  is  stationary,  however,  since  we  allow  the  signal  source  to  manu  - 
ver,  the  received  signals  are  not  stationary  and  we  could  not  apply  the  algorithm  developed  for  the 
stationary  case.  Also,  we  do  not  assume  long  observation  time,  and  we  relate  to  the  continuous-dis¬ 
crete  nature  of  the  problem. 
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Suppose  that  the  gain  g  in  (122)  is  known,  otherwise  it  could  be  factored  out  of  the  system 
equation  (93)  and  inserted  into  the  measurement  equation  (97)  as  an  unknown  amplitude  scale. 

Substituting  (121)-(123)  into  (104),  we  obtain  (see  [24127]) 


Ly(«)  -  c  + 


i  x(t)dxq(t)  -  *(t)x(t)Tdt  0 


where  0  =  {8^  6^  —8q)^,  and 


- 1.,  <•> 


i=l,2,...(q-l) 


x(t)dxq(t)  i  =  ^ 


^  -  g  (Tf  -  Ti)/2 


where  Xi(t)  denotes  the  i***  component  of  x(t)  (note  that  x(t)  is  a  q-dimensional  vector).  Therefore, 


H«)(tf)^  {L^(J)/z,,z„...zn} 

=  i  tfT  x(t)dXq(t)W  -  x(t)x(t)T(^^  dt  0 


where 


In  (126)  we  have  ignored  the  constant  c.  Observing  that  is  a  quadratic  function  of 

the  maximization  required  in  (118)  can  be  solved  analytically.  We  further  observe  that  the  computa¬ 
tion  of  H(^)(tf)  requires  only  the  computation  of  x(t)x(t)^^\  Therefore,  in  the  case  of  all-pole  sig¬ 
nals,  the  algorithm  assumes  the  form; 


E-Step:  Compute 


J(^)(t)  and  x(t)x(t)T^^^  T;  <  t  <  Tf 


(128) 


M-Step: 


-  Max{Eq.(114))  ===> 
r 

(129) 

-  Compute  using  Eq.(115) 

(130) 

-  Compute  using  Eq.(116) 

(131) 

-  Compute  using  Eq.(117) 

(132) 

-  Compute 
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#(«+!)  = 


(133) 


We  note  that  only  the  estimation  of  the  delay  parameters  involves  maximization.  We  now 
describe  the  computation  of  x(^)(t)  and  x(t)x(t)^^^  in  details. 

Development  of  the  Continuous-Discrete  Smoothing  Equations. 

The  computation  of  x(^)(t)  (Eq.  (Ill))  and  x(t)x(t)^^^  (Eq.  (112))  is  carried  out  by  considering 
the  equivalent  discrete  model  and  calculating  the  discrete-time  estimates  of  the  related  quantities, 
and  then  exploiting  the  Markovian  nature  of  the  underlying  stochastic  system  to  interpolate  between 
adjacent  time  instances.  We  shall  now  describe  the  proposed  procedure  step  by  step.  For  notational 
convenience  we  substitute  by  (. 

The  Equivalent  discrete  model 

Let  the  2N  time  points  tj^,  tjj-T(t|^)  k=l,2,...N  be  arranged  in  increased  order,  and  denote  by 
r  r  the  r^*'  time  instance.  Then,  in  view  of  (97)  and  (98),  the  scalar  measurements  can  be  repre¬ 
sented  as 


Zj.  =  Ar  h^^Xf  +  Pf  Vj.  r=l,2,...2-N  (134) 

where  x^  =  x(rr),  v^  r=l,2,...  are  statistically  independent  normalized  Gaussian  random  variables, 
and 

f  1  ^r^^i  “  ^zv  •••  i) 

(135) 

La  =  {Zii.  ^22.  •••  ZNz) 
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Pt  = 


2r<£  Oj 


(136) 


Now,  following  the  considerations  in  (e.g.,  (25]  [28]),  the  Xf  r=l,2,...  satisfy  the  following  sto¬ 
chastic  difference  equation: 


"r+l  *  *r+l(tf)Xr+  Ur+i  (137) 

where  4»r(tf),  the  discrete- time  state  transition  matrix,  is  specified  by 

4>r(tf)  =  <I>(rr.  Tr-i;  0)  (138) 

where  is  the  continuous-time  state  transition  matrix,  satisfying  the  differential  equation 

=  F(t;Wt.r,S)  t>r  (139) 

with  the  initial  condition  ^(T,r,0)  =  I.  The  r=l,2,...  are  statistically  independent  zero-mean  and 
Gaussian  with  the  covariance  matrix 


where 


Qr(«)  =  Q(t'r.  t'r-i;^) 


Q(t,r.fl)  =  I  $(t.s;«)G(s)G(s)T<I.(t,s;«)Tds 


(140) 


(141) 


Eqs.  (137)  and  (134)  specify  the  equivalent  discrete  model. 
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The  discrete  smoothing  equations 
Define 


Mr|k  =  Ef{Xr/z,,...Zk) 

Pflk  =  (*r  -  -Zk) 

Then,  the  discrete  Kalman  filtering  equations  are  (e.g.,  [18]  [25]); 

Propagation  equations:  For  r=l,2,...2N 


^r|r-l  =  *r  /*r-l|r-l 

“  *r  ^’r-ljr-l  *r^  Qr 

Up-dating  equations:  For  r=l,2,...,2N 


(142) 


(143) 


(144) 

(145) 


/*r|r  =  /*r|r-l  +  •'r  (^r  "  Mrjr-l )  (>46) 

Prir  =  (I-ArkrhT)Pr|r.i  (147) 


where  is  the  Kalman  gain  defined  by 


_ \ _ 


(148) 


To  initiate  the  recursions  in  (144)-(147)  we  must  select  some  initial  conditions  /IqIo  **o|o 
(e.g.,  /iQjo  =  0,  and  Pqi©  =  lA  for  some  f  >  0). 
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At  this  point  we  need  the  definition  of 

^r-l,rjk  “  "  ^r-I|k)  “  /*r)k^^/*i’*2— ^k)  (149) 

Sf-l  =  Pr-I|r-1  P^i'r-l  (*50) 

Then  the  discrete  smoothing  equations  are  (e.g.,  [25]) 

For  r  =  2N,2N-i,...l 

^r-I|2N  = /^r-l|r-l  +  Vl  (^rl2N  '  *r  Mr-l|r-l )  (151) 

Pr-1|2N  =  Pr-l|r-l  +  Sf.i  {Prj2N  -  Pr|r-1  )Sr-I  ^  (152) 

and  (see  Shumway  [29]) 

Pr-l,i12N  =  Sr-2  Pr-l|r-l  +  ^r-2  (Pr-l,r)2N  -  Pr-l|r-l  *r^)  (153) 

with  the  initial  condition 

P2N-1,2N|2N  =  P2N-II2N-1  *2N^  (*  "  '^2N 
The  interpolation  formulae 

Invoking  the  Markovian  nature  of  the  underlying  stochastic  system,  for  Tr-j  <  t  < 

x(t)  =  E^{x(t)/Zj,Zj...ZN} 

=  A(t)  Mr-1|2N  +  ®(*)  Mr12N 


(155) 
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x(t)»(t)^  =  Ef{x(t)x{t)T/Zj,Zj...ZN} 

=  x(t)x(t)T  +  A(t)  Pr-|(2N  A(t)^  +  B(t)  Pr|2N  8(0^ 

+  A(t)  Pr-i,r|2N  8(0”^  +  8(t)  p7.i,,^2N  (156) 

where 

A(t)  =  *(t.rr.,)  -  B(t)  *(rr,rr_i)  (157) 

B(t)  =  CKUr-l)  Wr.t)  CKM  r- 1 )  ♦(*■  r +  QC^r.^  (158) 

C(t)  =  Q(t.r  r- 1 )  -  B(t)*(r  r  ,t)  Q(t,r  r- 1 )  (159) 

where  *(1,7)  and  Q(t,T)  are  defined  in  (139)  and  (141),  respectively.  If  *(t,r,9)  and  Q(t,r,tf)  depend 
on  the  difference  (t-r)  (i.e.,  time  invariant  system)  the  interpolation  operation  can  be  carried  out 
efficiently  using  the  procedure  in  [26]. 

Further  Considerations 


(1)  Motivated  by  the  considerations  in  the  stationary  case,  near  the  point  of  convergence  we 
may  leave  the  spectral  parameters  at  their  current  values  and  perform  a  partial  M-step.  This 
way  will  still  retain  the  monotonic  increase  of  the  likelihood  function,  and  the  convergence 
to  a  stationary  point  is  guaranteed.  From  computational  point  of  view,  it  might  be  prefer¬ 
able  to  do  so,  perhaps  at  the  expense  of  convergence  rate  reduction. 

(2)  An  initial  estimate  of  the  signal  and  its  spectral  parameters  may  be  obtained  by  applying  the 
EM-type  algorithm  proposed  in  [23]  to  the  signal  observed  in  one  receiver  output  (say,  the 
receiver  with  the  higher  SNR).  Based  on  these  estimates,  an  initial  estimate  of  the  delay 
parameters  can  be  obtained.  Then  we  can  switch  to  the  full-scale  EM  algorithm. 

(3)  In  calculating  the  conditional  expectations  (111)-(1I3),  we  may  consider  performing  the  con¬ 
tinuous-time  interpolation  alone  (Eqs.  (1S5)-(156))  while  retaining  the  discrete  state  estimates 
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at  their  current  values  for  some  iterations.  As  proved  in  [30],  this  modification  preserves  the 
basic  properties  of  the  EM  algorithm,  that  is  the  monotonic  increase  of  the  likelihood  func¬ 
tion  on  each  iteration  cycle,  and  the  convergence  to  a  stationary  point  of  that  function.  We 
may  find  this  modification  useful  in  both  achieving  faster  convergence  rates  and  in  simpli¬ 
fying  the  computations  involved  (see  the  example  in  [30]). 

(4)  At  some  iterations,  we  may  leave  the  discrete  state  (signal)  estimates  available  from  the 
higher  SNR  receiver  output  at  their  current  values  while  performing  the  interpolation  opera¬ 
tion  using  the  measurements  of  the  other  receiver.  This  is  another  instance  of  the  algorithm 
proposed  in  [30]  that  may  simplify  the  computations  involved  and  accelerate  the  convergence 
rate. 

(5)  In  calculating  (111)-(113),  we  may  consider  substituting  the  full-scale  smoothing  by  fixed- 
lag  smoothing  or  even  by  filtering,  with  the  obvious  benefit  of  reduced  computations  and 
storage  requirements.  This  is  not  a  variant  of  the  EM  algorithm  in  the  sense  that  we  cannot 
guarantee  the  monotonic  increase  of  the  likelihood  function  or  the  convergence  to  a  station¬ 
ary  point.  However,  this  modification  might  be  particularly  useful  in  situations  where  the 
environment  is  changing  rapidly  and  we  want  an  adaptive  algorithm. 

Extension  to  Pole-Zero  Processes 

A  state  space  representation  of  a  pole-zero  (ARM A)  signal  is  given  by  (93) -(94),  where  F(tf)  is 
given  by  (121)  {$  are  the  so-called  AR  parameters),  G  is  given  by  (122),  and  h"*^  =  (hj...hq)  are  the 
moving  average  (MA)  parameters.  The  vector  unknown  parameters  f  to  be  estimated  is 
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Following  the  considerations  leading  to  (I08H110),  the  algorithm  in  that  case  takes  the  form: 


MaxH(0(tf)  =======>  #(^+1) 


Max  == 

r,a,(7j*,aj*,h 


=====>  f(/+i) 

5^2(/+1)  S^2(<+l)^fi(/+l) 


where  is  given  by  (126),  and 


(r,a,(7j2,(Tj2,h) 


fN 

=  -  ^  log  cTi*  -  ^  ^  (Zjk*  -  2Zjj^  hTJ(0(t,^)  +  hTx(t,^)x(tk)T^^^h] 


y  >08  ^2^  -  ^  ^  [Z2k*  -  2ofZjkhT£(0(t,^;r)  +  Q2hTx(tk;r)x(tk;r)T^^^h] 


where  x(t;T)  =  x(t-T(t)). 


From  (161)-(162),  we  see  that  the  optimization  with  respect  to  (w.r.t)  the  AR  parameters  0  is 
decoupled  from  that  associated  with  the  delay  r.  It  is  also  decoupled  from  the  optimization  w.r.t  the 
MA  part  of  the  spectral  parameters.  However,  the  latter  must  be  performed  jointly  with  the  delay 
r.  Since  G(^)(r,a,aj*,aj*,h)  in  (163)  is  quadratic  in  h,  the  maximization  w.r.t  to  h  can  easily  be 
obtained  for  any  specified  T,a,Oj^,aj^.  This  suggests  the  following  two-step  procedure: 
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1)  Up-date  r,a,aj*,(7j*  using  (114HII7),  (I19H120)  with  h=fi(0 

2)  Up-date  h  via 

MaxCW  (?(«+»),  h)  =======>  6(^+1)  (164) 

h 

This  two-step  maximization  preserves  the  monotonic  increase  of  the  likelihood  and  the  conver¬ 
gence  to  a  stationary  point  of  it,  prehaps  at  some  moderate  reduction  in  speed  of  convergence. 

The  idea  of  decomposing  the  spectral  optimization  into  separate  optimizations  w.r.t  to  the  AR 
and  the  MA  parts  is  due  to  Musicus  and  Lim  [31].  In  fact,  this  idea  could  also  be  incorporated  into 
the  algorithms  presented  in  Section  II  for  the  stationary  case. 

C.  Simulation  Results 

To  demonstrate  the  performance  of  the  algorithm  we  have  considered  the  following  example. 
The  source  radiates  a  continuous- time  all-pole  process,  whose  spectrum  is 

“  w<-2.4375u;*+2.25 

The  corresponding  state  space  model  of  the  signal  is  given  by 


s(t)  =  (l  O)x(t) 


The  only  unknown  parameter  to  be  estimated  is  the  delay  r.  The  data  were  generated  using 
r=3.1  (the  true  delay),  and  sampled  at  a  constant  rate  of  At  =  1.0.  The  noise  variances  are  = 


0.25  (0.5)  and  =  0.50  (1.0),  a  =  1  and  N  =  220  data  points.  Using  exhaustive  search,  the  ML 
estimate  is  found  to  be  at  r  ~  3.02  (3.0).  An  approximate  CRLB  is  obtained  by  calculating 

the  sample  variance  of  the  score  (see  Eq.  (167)  in  the  sequel)  using  monte-carlo  simulations.  We 
find  CRLB  ~  0.09  (0.149).  In  Figure  6  we  have  tabulated  the  results,  where  the  initial  guess  is 

r(o)  =  4.0.  We  observe  that  after  few  iterations,  the  algorithm  essentially  converges  within  the 
CRLB,  to  the  ML  estimate  of  the  delay.  For  reference,  we  have  also  calculated  the  log-likelihood 
function  L2(r)  along  the  iterations.  (L2(r)  is  computed  using  the  Kalman  filter  equations). 

IV.  GRADIENT-BASED  ALGORITHMS  AND  PERFORMANCE  EVALUATION 

As  shown  in  [16],  the  rate  of  convergence  of  the  algorithm  (near  the  point  of  convergence)  is 

+  h.'t 

geometrical  (exponential),  depending  on  the  fraction  of  the  complete  data^an  be  predicted  using  the 
observed  (incomplete)  data.  If  that  fraction  is  small  (e.g.,  under  low  SNR  conditions,  and/or  low 
sampling  rates)  the  rate  of  convergence  tends  to  be  slow,  in  which  case  we  want  to  use  the  Gaussian 
method  or  the  Newton-Raphson  or  some  other  gradient  search  algorithm.  These  methods  require 
the  computation  of  the  log-likelihood  gradient  (score)  and  the  computation  of  the  log-likelihood 
Hessian  or  the  Fisher's  information  matrix  (FIM),  or  some  approximation  of  which.  The  FIM  can 
further  be  used  to  assess  the  mean  square  error  of  the  ML  parameter  estimates. 

The  switching  from  the  EM  algorithm  to  a  gradient-based  method  can  be  facilitated  using  the 
following  identity,  first  presented  by  Fisher  (1925,  [32]),  and  recently  by  Dempster  et  al.  [16],  Louis 
[33],  and  Meilijson  [34]; 

^Lz(0-Ej|^  LY(e)/z.,z . zn|  = 
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‘  I  «{.«•) 


(165) 


Using  (165),  the  observed  data  score  can  be  computed  by  taking  the  conditional  expectation  of 
the  complete  data  score.  A  proof  of  Fisher's  identity  when  part  of  the  complete  data  are  continu¬ 
ous-time  processes  is  presented  in  [35]. 

To  simplify  he  exposition,  we  concentrate  on  the  all-pole  signal  case.  In  view  of  (105)  and 
(126), 


s  LzCf)  -  I  hW(») 


f'^f  r’^f 

J  x(t)Tdxq(t)  -  J  x(t)x(t)Tdt 


(166) 


where  (^)=  E^UO/Zi.Zjv-zn)-  Similarly,  in  view  of  (105)  and  (106), 


e(0  =  ( 


N  r 

-iZ-' 


azjk  s(tk;r)'a*s(tk;r)s(tk; 


;r)j  j=0,l,2,. 


k=l 


(167) 


ALz(f).  AG(0(,,a.<T.V,.) 


N 


=  ^  ihk  ^tk;»‘)  -  «*(tk;*’)] 


k»l 


(168) 
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f  (0  =  « 


N 


k=l 


(169) 


^  Lz(f)=  ^G«)  (w,V,*) 


do,^ 


^u)  =  e 


N 

^  ^  X  ^  “*  s*(tk;r)l 

*  *  k=l 


(170) 


where  we  note  that  in  the  case  of  all-pole  signals  s(t)  =  Xj(t)  and  s(t)  =  ds(t)/dt  =  Xj(t),  the  second 
component  of  the  qxl  state  vector  x(t),  provided  that  q>2  (otherwise  the  process  s(t)  is  not  differen¬ 
tiable,  and  the  regularity  conditions  imposed  upon  Fisher's  identity  are  violated  (see  [35])) .  With  this 
convention,  the  various  conditional  expectations  indicated  in  (166)  -  (170)  are  precisely  those 
required  by  the  EM  algorithm,  and  they  involve  the  continuous-discrete  smoothing  equations.  In 
fact,  only  the  discrete  smoothing  equations  are  needed  for  the  computation  of  (167)  (170). 

The  Fisher's  identity  can  also  be  applied  to  the  stationary  case.  There,  however,  we  could  get 
at  the  same  result  by  the  direct  differentiation  of  Lz(f)  (Eq.(8))  (with  the  exception  of  stationary 
signals  and  linearly  time-varying  delays,  where  Fisher's  identity  allows  us  to  differentiate  first  and 
then  perform  the  indicated  time-scale  operation.  By  that  we  retain  the  stationarity  of  the  underly¬ 
ing  processes  and  the  score  computation  becomes  relatively  easy^see  the  considerations  in  Section  11- 
E).  In  the  case  discussed  here,  the  direct  differentiation  of  Lz(C)  implies  a  significantly  more  com¬ 
plicated  way  of  computing  the  score,  although  the  final  result  should  be  the  same.  The  direct  dif¬ 
ferentiation  of  Lz({)  with  respect  to  0  requires  the  computation  of  the  continuous-discrete  filtering 
equations  [25]  and  their  derivatives  with  respect  to  the  components  of  0  (the  so-called  sensitivity 
derivatives  [36]).  Thus,  if  is  a  p-dimensional  vector,  the  direct  approach  requires  roughly  the 
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equivalent  of  (p+1)  continuous-discrete  filters.  Score  evaluation  based  on  the  equivalent  discrete 
model  is  even  more  complicated  since  it  requires  the  differentiation  of  the  discrete  filtering  equa¬ 
tions  and  the  associated  matrices  with  respect  to  9.  The  direct  differentiation  with  respect  to  r  may 
not  even  be  expressed  analytically,  and  one  is  forced  to  substitute  the  derivatives  with  finite  differ¬ 
ences. 

Using  (166)  -  (170)  we  are  able  to  compute  the  score  in  a  closed-form  with  a  computation  that 
only  employs  the  Kalman  smoother.  We  note  that  the  continuous-discrete  smoothing  is  performed 
by  going  through  the  equivalent  discrete  model,  but  there  is  no  need  to  differentiate  its  matrices,  as 
is  needed  in  the  direct  computation  of  the  score.  We  further  note  that  the  smoothed  error  covari¬ 
ance  matrix  can  be  precomputed,  a  feature  that  can  be  exploited  for  efficient  computations  and 
approximations  of  the  score.  For  example,  pre-computation  of  the  error  covariance  matrix  may  in¬ 
dicate  whether  approximation  can  be  made  by  performing  filtering  alone  rather  than  the  full  smo¬ 
othing  with  the  obvious  benefit  of  reduced  computations  and  storage  requirements. 

Hessian  Evaluation 

The  log-likelihood  Hessian 


Hz(0=  a*Lz(e)/ae^  (171) 

is  computed  by  differentiating  the  expression  for  the  score  (Eqs.  (166)-(170))  with  respect  to  the 
components  of  The  derivatives  with  respect  to  9  result  a  set  of  forward-backward  sensitivity 
equations  that  require  roughly  the  equivalent  of  (p+1)  continuous-discrete  Kalman  smoothers.  The 
derivatives  with  respect  to  r  are  not  accessible  since  r  is  implicitly  induced  in  the  Kalman  smoothing 
equations.  In  practice,  the  derivatives  with  respect  to  r  may  be  approximated  by  finite  differences. 
We  perturb  one  coordinate  of  r  at  a  time  and  compute  the  resulting  score  at  the  perturbed  parame¬ 
ter.  If  r  is  a  m-dimensional  vector,  this  approximation  requires  the  computation  of  the  smoothing 
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equations  at  (m+1)  closely  spaced  values  ot  r,  a  task  that  can  be  simplified  by  pre-computation  of 
the  smoothed  error  covariance  matrix.  The  same  numerical  procedure  can  be  applied  with  respect 
to  the  components  of  9.  Also,  pre-computation  of  the  smoothed  error  covariance  matrix  may  indi¬ 
cate  if  further  approximation  can  be  made  by  performing  filtering  alone  rather  than  the  full  smo¬ 
othing,  resulting  in  a  simple  recursive  approximation  of  the  Hessian.  Another  approximation  of  the 
Hessian,  based  on  score  computations  will  be  presented  in  the  sequel. 

FIM  Evaluation 

The  FIM  can  be  used  to  assess  the  mean  square  errors  of  the  resulting  ML  parameter  estimates. 
It  can  further  be  used  in  conjunction  with  the  scoring  algorithm,  which  is  a  variant  of  the  Newton - 
Raphson  method  where  the  Hessian  is  approximated  by  the  FIM,  with  a  minus  sign. 

Assuming  that  the  source  signal  and  the  additive  noises  are  stationary,  and  that  the  observation 
interval  is  long  compared  with  the  correlation  time  (inverse  bandwidth)  of  the  signal  and  the  noises, 
the  FIM,  computed  on  the  basis  of  continuous-time  observations,  assumes  the  form  [4]  [22] 

/  aLz(£)T  aLz<«  1 

\  'sr  I 

0 

0  h 

where 

,  A  ^  JaLz(e)T  3Lz(f)| 

[—9^ - 


(172) 


(173) 


Closed  form  expressions  for  and  Jj  can  be  found  in  [6]  [22].  This  result  is  an  extension  of 
the  result  in  (15)  for  the  case  of  non-stationary  (moving)  sources.  It  asserts  that  the  quality  of  the 
delay  estimates  is  unaffected  by  lack  of  knowledge  of  the  spectral  parameters,  and  vice  versa. 
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However,  this  result  critically  depends  on  the  stationarity  of  the  source  signal  and  the  large  time- 
bandwidth  assumptions.  For  moderate  observation  intervals,  the  delay  and  spectral  estimates  are  sta¬ 
tistically  correlated,  the  FIM  is  no  longer  block  diagonal,  and  its  computation  becomes  more  compli¬ 
cated.  If,  in  addition,  we  allow  non-stationary  signals,  the  direct  computation  of  the  FIM  becomes 
exceedingly  more  complicated  and,  as  far  as  we  know,  it  has  not  been  attempted  yet. 

By  computing  the  second  moment  (covariance)  of  the  score  (Eqs.  (166)-(170)),  we  are  able  to 
evaluate  the  FIM  even  for  the  non-stationary  case.^  To  simplify  the  exposition,  we  concentrate  on 
the  delay  estimation  problem,  and  assume  that  tf,  a,  and  Oj*  are  known. 

From  (167), 


Nr"  "^1 

X!  *k  [^hk  -  ‘»(‘k;*‘))  s(tk;*-) 


=  Ci  -  ^  ^  t|^  s(tk;r) 


where  cj  is  a  constant  independent  of  Zj,Zj,...z^ 

a  Q 

Taking  the  expectation  of  the  product  —  L7{t)  L7(r)  and  invoking  the  zero-mean  pro- 

vT"  I  dT y 

perty  of  the  score,  that  is  Ey{  —  L2(f)}=0  to  eliminate  cj,  the  i,j-element  of  the  FIM  is  given  by 


JuW  =  Er||:LzW^Lzw|- 


1  We  note  in  passing  that  the  rationale  of  Fisher's  identity  leads  also  to  the  reversibility  theorem, 
used  by  Weinstein  [4]  to  derive  the  FIM  for  the  moving  source,  WSS  signal  case. 
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m=l 


/\ 

Vjm  s(tm;r)  [cj 


a 


A  ^ 


(175) 


Using  the  well-known  formula  for  the  expectation  of  the  product  of  four  zero-mean  Gaussian 
random  variables,  and  invoking  again  the  zero-mean  property  of  the  score  to  eliminate  cj,  we 
obtain 

N  N - -  - 

l=l 

where  (•)  ^  Ef(-).  Recall  that  for  all-pole  signals  s(t)=Xj(t),  the  second  component  of  the  state 
vector.  Hence, 

N  N 


e=i 

where  Xj(t;r)  =  Xj(t-r(t)). 

The  above  result  corresponds  to  sampled  data.  We  have  therefore  taken  into  consideration  the 
loss  of  information  due  to  aliasing  effects. 

Observing  that  v^j^  =  z^j^  -  ah^x(t(^;r),  the  elements  of  the  FIM  are  expressible  in  terms  of 
E^{x(tm;f)x(t/;f)^}  =  Pn;';  the  covariance  matrix  of  the  smoothed  estimates  of  the  state. 

The  expression  for  the  FIM  involves  the  double  sum  over  N,  the  number  of  data  points,  and 
therefore  tends  to  be  computationally  expensive.  Successive  approximat  ions  of  the  FIM,  based  on 
substituting  the  full-scale  smoothed  estimates  Xj(t|^;r)  and  v^j^  with  fixed-lag  smoothing  of  increas¬ 
ing  order,  can  be  obtained  following  the  considerations  in  [37], 
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am  ^^2' 


■  ^2l  X2(*m;*’)  +  ''2m  X2(lmi»’)]  0^^) 
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An  alternative  approximation  of  the  FiM  can  be  obtained  by  decomposing  the  score  into  condi¬ 
tional  scores 


aLz(€)  ^  aLrir-i(0 
di  "  Lu  Sf 
r-1 

where 


(178) 


log  r-l 

Lr1r-l(f)=  (179) 

log  f(zr/Zr,i....Zj;()  r=2,3,...N 

where  f(-)  stands  for  the  appropriate  probability  density  function  and  the  z^'s  are  the  scaler  meas¬ 
urements  defined  in  £q.  (134).  Eq.  (178)  is  a  martingale  representation  of  the  score  since  the  incre¬ 
ments  are  the  conditional  scores,  and  therefore  have  zero  conditional  expectation  given  the  past. 
Following  [34],  we  propose  the  following  approximation  of  the  FIM: 


T 

1  *  r 
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■ 

N 

1 

N 

J({) 

N  Z 

r=l 

£-1 

r=l 


This  estimate  is,  in  fact,  the  sample  covariance  matrix  of  the  conditional  scores,  where  the 
actual  FIM  is  the  covariance  matrix  of  the  score.  The  estimate  may  also  be  used,  with  a  minus  sign, 
to  approximate  the  Hessian.  Since  the  increments  of  a  martingale  are  uncorrelated,  the  expected 
value  of  the  sum  of  squares  of  the  conditional  scores  (the  first  term  on  the  second  line  of  (180))  is 
an  unbaised  estimate  of  the  covariance  matrix  of  their  sum,  which  is  the  FIM.  The  expected  value 
of  the  second  term  of  that  expression  equals  the  FIM  divided  by  N.  Therefore,  J(f)/(N-1)  has  the 
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same  expectation  as  FIM/N  (i.e.,  "asymptotic  unbiasedness").  Furthermore,  consistency  of  the  esti¬ 
mate  can  be  shown  under  stationarity  and  ergodicity  of  the  martingale-difference  increments  (i.e., 
the  conditional  scores),  e.g.,  see  [38]  [39].  We  note  that  under  the  usual  regularity  conditions  for  ML 
to  apply,  the  consistency  will  be  preserved  if  we  replace  f  by  its  ML  estimate. 

Now,  since 


f(Zr/Zr-l 


f(2r,...z^;0 

f(Zr.  I  ,...Zj;e) 


(181) 


it  immediately  follows  that 


^  (182) 

where  Lf({)  denotes  the  log-likelihood  function  of  the  observed  data  Zj,...Zf.  Substituting  (182)  into 
(180),  the  proposed  estimate  can  be  computed  using  (166)-(170). 
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V.  CONCLUSIONS 

We  have  developed  computationally  efficient  iterative  algorithms  for  finding  the  Maximum 
Likelihood  estimates  of  the  delay  and  spectral  parameters  of  a  noise-like  Gaussian  signal  radi¬ 
ated  from  a  common  point  source  and  observed  by  two  or  more  spatially  separated  receivers. 
We  first  consider  the  stationary  case  in  which  the  source  is  stationary  (not  moving)  and  the 
observed  signals  are  modeled  as  wide  sense  stationary  processes.  We  then  extend  the  scope  by 
considering  a  non-stationary  (moving)  source  radiating  a  possible  non-stationary  stochzistic 
signal.  In  that  context,  we  address  the  practical  problem  of  estimation  given  discrete-time 
observations.  We  also  present  efficient  methods  for  calculating  the  log-likelihood  gradient 
(score),  the  Hessian,  the  Fisher’s  information  matrix  under  stationary  and  non-stationary 
conditions. 
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Figure  Captions 


Figure  1:  Two-channel  time  delay  estimator  -  the  stationary  case. 


Figure  2:  Simulation  results. 


Figure  3:  The  multi-receiver  vector  delay  estimator. 


Figure  4:  Two-channel  time  delay  and  Doppler  estimator. 


Figure  5:  Two-channel  time  delay  estimator  -  the  non-stationary  case. 


Figure  6: 


Simulation  results. 


Figure  1  :  TWO- CHANNEL  TIME  DELAY  ESTIMATOR  -  THE  STATIONARY  CASE 
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Figure  2:  SIMULATION  RESULTS 


Figure  3:  THE  MULTI  -  RECEIVER  VECTOR  DELAY  ESTIMATOR 


Figure  4  :  TWO-CHANNEL  TIME  DELAY  AND  DOPPLER  ESTIMATOR 


5  :  TWO-CHANNEL  TIME  DELAY  ESTIMATOR -THE  NON-STATION  ARY  CASE 
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